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J>> University of California, Berkeley 

For population genetics models with recombination, obtaining 
an exact, analytic sampling distribution has remained a challenging 
^SJ ■ open problem for several decades. Recently, a new perspective based 

on asymptotic series has been introduced to make progress on this 
problem. Specifically, closed-form expressions have been derived for 
the first few terms in an asymptotic expansion of the two-locus sam- 
pling distribution when the recombination rate p is moderate to large. 



r^ ■ In this paper, a new computational technique is developed for find- 



ing the asymptotic expansion to an arbitrary order. Computation in 
this new approach can be automated easily. Furthermore, it is proved 
here that only a finite number of terms in the asymptotic expansion 
is needed to recover (via the method of Fade approximants) the ex- 

r\l ■ act two-locus sampling distribution as an analytic function of p; this 

^ ' function is exact for all values of p £ [0, oo). It is also shown that the 

t'*^ , new computational framework presented here is flexible enough to 

0^ ■ incorporate natural selection. 

00 

rn 

r — I ■ 1. Introduction. Central to many applications in genetic analysis is the 

^^ I notion of sampling distribution, which describes the probability of observ- 

ing a sample of DNA sequences randomly drawn from a population. In the 
one-locus case with special models of mutation such as the infinite-alleles 
model or the finite-alleles parent-independent mutation model, closed-form 
^ ■ sampling distributions have been known for many decades [Wright (1949), 

Ewens (1972)]. In contrast, for multi-locus models with finite recombina- 
tion rates, finding a closed-form sampling distribution has so far remained 
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a challenging open problem. To make progress on this long-standing issue, 
we recently proposed a new approach based on asymptotic expansion and 
showed that it is possible to obtain useful analytic results when the recom- 
bination rate is moderate to large [Jenkins and Song (2009, 2010)]. More 
precisely, our previous work can be summarized as follows. 

Consider a two-locus Wright-Fisher model, with the two loci denoted by 
A and B. In the standard coalescent or diffusion limit, let 6a and 6b denote 
the population-scaled mutation rates at loci A and B, respectively, and let p 
denote the population-scaled recombination rate between the two loci. Given 
a sample configuration n (defined later in the text), consider the following 
asymptotic expansion of the sampling probability q{n \ 6a, 6b, p) for large p: 

I \Q a \ ( \a a \ ^ qi{Ta.\6A,6B) 
g(n dA,6B,P) = go(n 6a,6b)^ 

(1.1) 

g2(n| 6a,6b) 

p^ 

where the coefficients qo{n \ 6a,6b),(Ii{^ \ 6a,6b),Q2{^ \ 6a, 6b), ■ ■ ■ , are in- 
dependent of p. The zeroth-order term qq corresponds to the contribution 
from the completely unlinked (i.e., p = oo) case, given simply by a product 
of marginal one-locus sampling distributions. Until recently, higher-order 
terms (7M(n | 6a, 6b), for M > 1, were not known. In Jenkins and Song 
(2010), assuming the infinite-alleles model of mutation at each locus, we 
used probabilistic and combinatorial techniques to derive a closed-form for- 
mula for the first-order term qi, and showed that the second-order term 52 
can be decomposed into two parts, one for which we obtained a closed-form 
formula and the other which satisfies a simple recursion that can be easily 
evaluated using dynamic programming. We later extended these results to 
an arbitrary finite-alleles model and showed that the same functional form 
of qi is shared by all mutation models, a property which we referred to as 
universality [see Jenkins and Song (2009) for details]. Importantly, we also 
performed an extensive study of the accuracy of our results and showed that 
they may be accurate even for moderate values of p, including a range that 
is of biological interest. 

Given the above findings, one is naturally led to ask several important 
follow-up questions. In particular, the following questions are of both theo- 
retical and practical interest. 

1. Is it possible to compute the higher-order coefficients qM{n \ 6a, 6b) for 
M>2? 

2. For a given finite p > 0, does the series in (1.1) converge as more terms 
are added? 

3. If not, how should one make use of the asymptotic expansion in practice? 

4. Is it possible to incorporate into the asymptotic expansion framework 
other important mechanisms of evolution such as natural selection? 
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In this paper, we develop a new computational technique to answer the 
above questions. Our previous method requires rewriting complex recursion 
relations into more structured forms, followed by laborious computation of 
the expectation of increasingly complicated functions of multivariate hyper- 
geometric random variables. Generalizing that method to go beyond the 
second order (i.e., M > 2) seems unwieldy. In contrast, our new method is 
based on the diffusion process and it utilizes the diffusion generator to or- 
ganize computation in a simple, transparent fashion. Moreover, the same 
basic procedure, which is purely algebraic, applies to all orders and the 
computation can be completely automated; we have, in fact, made such an 
implementation. 

To recapitulate, we propose here a method of computing the asymptotic 
expansion (1.1) to an arbitrary order. That is, for any given positive inte- 
ger M, our method can be used to compute the coefficients gfc(n | 6a, Ob) for 
all k < M; Theorem 3.1 summarizes this result. As discussed in Section 6.2, 
however, one can find examples for which the series (1.1) diverges for fi- 
nite, nonzero p. To get around this problem, we employ the method of Fade 
approximants. The key idea behind Fade approximants is to approximate 
the function of interest by a rational function. Although (1.1) may diverge, 
we show that the sequence of Fade approximants converges for all values of 
p> 0. In fact, for every sample configuration n, we show that there exists 
a finite positive integer C(n), such that the Fade approximant of the asymp- 
totic expansion up to order > C(n) is equal to the exact two-locus sampling 
distribution. Hence, our result implies that only a finite number of terms 
in the asymptotic expansion need to be computed to recover (via the Fade 
approximant) the exact sampling distribution as an analytic function of p; 
this function is exact for all values of p, including 0. Theorem 4.1 and the 
surrounding discussion lay out the details. Last, we also show in this paper 
that our new framework is flexible enough to incorporate a general model 
of diploid selection. This extension is detailed in Section 5. 

The above-mentioned convergence result is theoretically appealing. For 
practical applications, however, one needs to bear in mind that the value 
of C(n) generally grows with sample size, thus implying that obtaining an 
exact, analytic sampling distribution may be impracticable for large sam- 
ples. A possible remedy, which works well in practice, is to compute the 
asymptotic expansion only up to some reasonable order M < C(n), and use 
the corresponding Fade approximant as an approximate sampling distribu- 
tion. We show in Section 6 that using M = 10 or so produces quite accurate 
results. 

An important advantage of our method over Monte Carlo-based meth- 
ods is that, for a given mutation model, the bulk of the computation in 
our approach needs to be carried out only once. Specifically, the coefficients 
(7fc(n I 9a, 6b) need to be computed only once, and the same coefficients 
can be used to evaluate the sampling distribution at different values of the 
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recombination rate p. We expect this aspect of our work to have impor- 
tant practical imphcations. For example, in the composite likelihood method 
for estimating fine-scale recombination rates [Hudson (2001), McVean et al. 
(2004)], one needs to generate exhaustive lookup tables containing two-locus 
sampling probabilities for a wide range of discrete p values. An alternative 
approach would be to store the coefficients ^^(n | 9 a, Ob) instead of gen- 
erating an exhaustive lookup table using importance sampling, which is 
computationally expensive. 

The rest of this paper is organized as follows. In Section 2, we lay out the 
notational convention adopted throughout this paper and review our previ- 
ous work on asymptotic expansion of the two-locus sampling distribution up 
to second order. Our new technique for obtaining an arbitrary-order asymp- 
totic expansion is described in Section 3, where we focus on the selectively 
neutral case. In Section 4, we present the method of Pade approximants 
and describe the aforementioned result on convergence to the exact sam- 
pling distribution. In Section 5, we describe how natural selection can be 
incorporated into our new framework. Finally, we summarize in Section 6 
our empirical study of the accuracy of various approximate sampling dis- 
tributions and provide in Section 7 proofs of the main theoretical results 
presented in this paper. 

2. Notation and review of previous work. In this section, we introduce 
some notation and briefly review previous results on asymptotic sampling 
distributions. Initial results were obtained for the infinite- alleles model of 
mutation [Jenkins and Song (2010)] and later generalized to an arbitrary 
finite-alleles model [Jenkins and Song (2009)]. In this paper we focus on the 
latter case. 

The set of nonnegative integers is denoted by N. Given a positive inte- 
ger k, [k] denotes the set {1, . . . , A;}. For a nonnegative real number z and 
a positive integer n, {z)n := z{z + 1) • • • (z + n — 1) denotes the nth ascend- 
ing factorial of z. We use to denote either a vector or a matrix of all 
zeroes; it will be clear from context which is intended. Throughout, we con- 
sider the diffusion limit of a haploid exchangeable model of random mating 
with constant population size 2N . We refer to the haploid individuals in 
the population as gametes. Initially we shall assume that the population is 
selectively neutral; we extend to the nonneutral case in Section 5. 

2.1. One-locus sampling distribution. The sample configuration at a lo- 
cus is denoted by a vector n = (ni, . . . , uk), where ni denotes the number of 
gametes with allele i at the locus, and K denotes the number of distinct pos- 
sible alleles. We use n = X]i=i ^i to denote the total sample size. Let u denote 
the probability of mutation at the locus per gamete per generation. Then, in 
the diffusion limit, A^ — )■ oo and u — )• with the population-scaled mutation 
rate = ANu held fixed. Mutation events occur according to a Poisson pro- 
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cess with rate 0/2, and allelic changes are described by a Markov chain with 
transition matrix P = (Pij); that is, when a mutation occurs to an allele i, 
it mutates to allele j with probability Pij. 

We denote by p(n) the probability of obtaining the unordered sample 
configuration n. When writing sampling probabilities, we suppress the de- 
pendence on parameters for ease of notation. By exchangeability, the prob- 
ability of any ordered configuration corresponding to n is invariant under 
all permutations of the sampling order. We may, therefore, use q{n) without 
ambiguity to denote the probability of any particular ordered configuration 
consistent with n. The two probabilities are related by 

p(n) = — — '■ -q{n). 

nil ■ ■ -riK- 

Throughout this paper, we express our results in terms of ordered samples 
for convenience. 

Consider an infinite population specified by the population-wide allele 
frequencies x = (xj, . . . , xk), evolving according to a Wright -Fisher diffusion 
on the simplex 

(2.1) Ak=U= (Xi) G [0, 1]^^ :^Xi = 1 l 

We assume that a sample is drawn from the population at stationarity. No 
closed-form expression for q{n) is known except in the special case of parent- 
independent mutation (PIM), in which Pij = Pj for all i. In the PIM model, 
the stationary distribution of x is Dirichlet with parameters {9 Pi, ■ ■ ■ , OPk) 
[Wright (1949)], and so (?(n) is obtained by drawing an ordered sample from 
this population: 



(2.2) g(n)=E 



K 



n^r 



.i=l 






This sampling distribution can also be obtained by coalescent arguments 
[Griffiths and Tavare (1994)]. 

2.2. Two loci. We now extend the above notation to two loci, which we 
refer to as A and B. Denote the probability of a recombination event be- 
tween the two loci per gamete per generation by r. In the diffusion limit, as 
A'' — )• oo we let r — 7- such that the population-scaled recombination param- 
eter p = ANr is held fixed. Suppose there are K possible alleles at locus A 
and L possible alleles at locus B, with respective population-scaled muta- 
tion parameters 6a and 9b, and respective mutation transition matrices P 
and P . The two- locus sample configuration is denoted by n = (a, b, c), 
where a = (ai, . . . , ax) with Oj being the number of gametes with allele i at 
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locus A and unspecified alleles at locus B; h = (bi, . . . , bi) with bj being the 
number of gametes with unspecified alleles at locus A and allele j at locus B; 
c = (cij) is a K X L matrix with Cij being the multiplicity of gametes with 
allele i at locus A and allele j at locus B. We also define 



K L K 



i=l j=l i=i j=l 



T.b. 



K 

i=l 



and use c^ = (cj.)jg[j^] and c^ = {c-j)j£[l] to denote the marginal sample 
configurations of c restricted to locus A and locus i?, respectively. Notice 
the distinction between the vectors a and b, which represent gametes with 
alleles specified at only one of the two loci, and the vectors ca and c^, which 
represent the one-locus marginal configurations of gametes with both alleles 
observed. 

When we consider the ancestry of a sample backward in time, a gamete 
may undergo recombination between the two loci, with each of its two par- 
ents transmitting genetic material at only one of the two loci. We allow the 
nontransmitting locus to remain unspecified as we trace the ancestry further 
back in time. 

Denote by g(a,b,c) the sampling probability of an ordered sample with 
configuration (a, b,c), again suppressing the dependence on parameters for 
ease of notation. Sampling is now from a two-dimensional Wright-Fisher 
diffusion with population allele frequencies x= {xij)i^[K],i^[L]i evolving on 
the state space 

(2.3) Ai^xL = I X = {xij) G [0, 1]^^^ : X^ X^^ij = 1 [ • 

As before, g(a, b,c) is specified by drawing an ordered sample from the 
population at stationarity: g(a,b,c) =E[F(X;n)], where 

(2.4) F(x;n) = 

with Xi. = Y^-^ Xij and x.j = '^^^^Xij. In the two- locus model with 0<p<oo, 
the stationary distribution, and hence, the sampling distribution, is not 
known in closed- form even when the mutation process is parent-independent. 
However, when p = oo, the two loci become independent and q{a, b, c) is sim- 
ply the product of the two marginal one- locus sampling distributions. More 
precisely, denoting the one-locus sampling distributions at A and B by q^ 
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and q^ , respectively, we have 

lim g(a, b, c) = q^{a. + c^)g'^(b + c^) 

p— >oo 

for all mutation models [Ethier (1979)]. In particular, if mutation is parent- 
independent, then we do have a closed-form formula for g(a, b,c) when 
p = oo, since from (2.2) we know that 

K L 

(2.5) q\^)= \[{eAPt)a. and q^{h) = —^\[{eBPf),,. 



\ I l\"^^j /"i — "" 1 \"J in 



B b 



i=i 



2.3. Asymptotic sampling formula. As mentioned in the Introduction, 
although a closed-form formula for ^(a, b, c) is not known for an arbitrary p, 
previously we [Jenkins and Song (2009, 2010)] were able to make progress 
by posing, for large p, an asymptotic expansion of the form 

/oc^ ( x^ \ ^ u \ , '?i(a't»,c) gr2(a,b,c) 

(2.6) g(a, b, c) = go(a, b, c) + + ^ + ' ' ' ' 

P P 

where the coefficients ^^(a, b,c), for all fc > 0, are independent of p. We 
summarize our previous results in the following theorem, specialized to the 
case of finite-alleles mutation, which is our interest here. 

Theorem 2.1 [Jenkins and Song (2009)]. In the asymptotic expan- 
sion (2.6) of the neutral two-locus sampling formula, the zeioth-order term 
is given by 

(2.7) go(a, b, c) = g^(a + CA)q^ih + cb), 
and the first- order term is given by 

qiiai,h,c)=('^U^{ai+CA)q^{h + CB) 



(2.^ 



-g^(b + CB)X;(t)9^(a + CA-e.) 
-/(a + CA)X;('^)g''(b + CB-e,) 

K L . . 

+ E E (''2 J ^^("^ + ^^ - '^^^'^''(^ + ^^ - *'^)' 



i=l j=l 

where ej is a unit vector with a 1 at the ith entry and O's elsewhere. Fur- 
thermore, the second-order term can be decomposed as 

(2.9) q2{a., b, c) = a(a, b, c) + q2{a. + ca, b + c^, 0), 
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where a{a.,h,c) is known analytically and g2(a, b, 0) satisfies the recursion 
relation 

[a{a + 9A-l)+b{b + 9B-l)]q2ia,h,0) 

K 

ai{ai - l)g2(a-ei,b,0) 



i=l 



L 



+ XI ^J (^J - 1)^2 (a, b - Gj , 0) 

K K 

(2.10) +9AXai5^^if'?2(a-ei + ei,b,0) 

i=l t=l 
L L 

+ ^bY^ bj Y^ Pif 92 (a, h-ej+et,0) 

j=i t=i 

K L 

+ 4^j;a.6,[(a-l)(6-l)/(a)g^(b) 
i=i j=i 

_(6_l)(a._l)/(a-ei)(?^(b) 

_(a-l)(6,-l)/(a)g^(b-e,) 

+ {oi - 1)(6, - l)g^(a - ei)q^{h - e,)] 

mt/i boundary conditions 52(^4, 0,0) = 0, 52(0,6^,0) = and 52(^1, ej,0) = 
for all i S [K] and j G [L] . 

The expression for (T(a, b, c) can be found in Jenkins and Song (2009) and 
we do not reproduce it here. Notice that go(a, b, c) and gi(a, b,c) exhibit 
universality: their dependence on the model of mutation is subsumed entirely 
into the one-locus sampling distributions. 

The proof of Theorem 2.1 used coalescent arguments. By considering the 
most recent event back in time in the coalescent process for the sample, it is 
possible to write down a recursion relation for the sampling distribution. In 
a two-locus, finite-alleles model, the appropriate recursion is a simple modifi- 
cation of the one introduced by Golding (1984) for the m/imie-alleles model, 
also studied in detail by Ethier and Griffiths (1990). By substituting (2.6) 
into this recursion, after some lengthy algebraic manipulation one can obtain 
the expressions given in Theorem 2.1 [see Jenkins and Song (2009, 2010) for 
details] . 

3. Arbitrary-order asymptotic expansion. The approach described in 
Section 2.3 does not generalize easily. In what follows, we introduce a new 
approach based on the diffusion approximation. This new method is more 



K L f) ^ 

.k=l 1=1 ^^'' k=l 



d 

UXij 
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transparent and more easily generalizable than the one used previously, and 
we illustrate this point by developing a method for computing the asymp- 
totic expansion (2.6) to an arbitrary order. 

3.1. Diffusion approximation of the two-locus model. Our approach is 
based on the diffusion process that is dual to the coalescent process. The 
generator for the two-locus finite-alleles diffusion process is 

K L 

(3.1) 

L 

+ &bY^ xa{P/j - Sji) + p{xi.x. 
1=1 

where (5jfc is the Kronecker delta. For notational convenience, henceforth, 
where not specified otherwise, the indices i and k are assumed to take value 
in [K] , while the indices j and / are assumed to take value in [L] . 

In what follows, we change to a new set of variables that capture the 
decay of dependence between the two loci, an approach originally due to 
Ohta and Kimura (1969a, 1969b). Specifically, the key quantity of interest 
is the following definition. 

Definition 3.1. The linkage disequilibrium (LD) between allele i at 
locus A and allele j at locus B is given by 

O'ij — Xij Xi-X.j. 

The collection of (K -|- 1)(L -|- 1) — 1 new variables is 

{xi.,. ..,xk;x.i,.. .,x.l; dii, . . -jcIkl)- 

The diffusion is then constrained to the (KL — l)-dimensional simplex AkxL 
by imposing the conditions 

K L K 

i=l 7=1 j=l 

(3.2) 

L 

y^dij = V-i. 
i=i 

3.2. Rescaling LD. Since we are interested in the large p limit, we expect 
each dij to be small. We introduce the rescaling dij = y/pdij. The reason for 
this choice should become clear from (3.3) below; in the resulting generator, 
there should be a nontrivial interaction between recombination and genetic 
drift, that is, they should both act on the fastest timescale. The leading-order 
contribution to the generator, denoted below by L^"^' , has contributions from 
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both of these biological processes if and only if we use this choice for dij . See 
Song and Song (2007) for another example of this rescaling. By substituting 
for the new variables in (3.1) and using (3.2) for extensive simplification, 
the generator can be expressed as 



(3.3) 



-4 



P 



l(2) + ^l« + lW + — l(-i 



Vp 



where the operators in (3.3) are given by 

^(2) ^ ^ ) ^.^ . ^(J.^ _ Xk.){6jl - x.i) 



«j 



d 



k,l 



dd 



kl 



d 

ddij 



L^^^ = ^ [2dij{5ik - Xk.){6ji - x.i) - 6ik6jidij + 2diiXk.x.j] 

i,j,k,l 



d^ 



L(0) = - ^ d^jdkl 
i,j,k,l 



d^ 



ddki ddij 



+ 2 y][((5ifc - Xi.)dki - diiXk] 
ddki ddij i k I ddki oxi 



d^ 



ddki dx.j 



+ 2 ^[(<^ji - x.j)dki - dkjX.i] 

j,k,l 

+ y: ^^- is^>^ - ^fc) q£q^ + e ^■.- ('^.' - -') a,., 9,,^ 



d^ 



i,k 



hi 



Y, eAJ2dkJiPk^^-^^k) + 9BJ2diliPS-^: 



'jU 



*J 



+ ^E-^(^^1-^^^)7^ + ^E-'(^^f-M 



- 2dij 


d 

ddij 


d 





i,k 



hi 



dx. 



L(-^)=2j:4 



52 



«J 



dxi. dx.j 



This generator extends that of Ohta and Kimura (1969a) from a (2 x 2)- 
to a {K X L)-allele model. Note that ours differs from that of Ohta and 
Kimura (1969a) by a factor of two; one unit of time corresponds to 2N 
(rather than N) generations in our convention. 

Recall that our interest is in calculating the expectation at stationarity 
of the function -F(x; n) shown in (2.4), which is now viewed as a function of 

X = (xi. , . . . , Xi^.; X.I , . . . , x.l; dii , . . . , dKh)- 

In the same way that the multiplicity matrix c represents multinomial sam- 
ples from a population with frequencies {xij)i^[K],j£[L]-: we introduce an anal- 
ogous matrix r = {rij)i^\^x],j(^[L] associated with the variables (c^ij)iG[_ft'],je[i/]- 
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We further define the marginal vectors r^ = (rj.)jg[j^] and r^ = (?"j)je[L]! 
where rj. = 'Ylij'''ij ^^^"^ '''■j — Yli'''ij-> analogous to ca and cb- In this nota- 
tion, the function -F(x; n) becomes 



K 



K L 



(3.4) 



where 



F(s;n)=(n<' n-1 nn 



" 1 r 

2^ pm/2 2-^ ii 
m=0 ^ r&Tm '- hj 



Vp 



+ Xi.X 



l--^-J 



G^"") (x; a + CA - r^, b + Cb - ru , r) 



(3.5) GM(x;a,b,r)= fn-?l ffl-^l (flU^^^ 

and the inner summation in (3.4) is over all i^ x L matrices r of nonnegative 
integers whose entries sum to m: 

Note that only those matrices which form "subsamples" of c have nonzero 
coefficient in (3.4); that is, < rjj < Cij for all i and j. 

3.3. The key algorithm. We now pose an asymptotic expansion for the 
expectation E[G(™)(X;a,b,r)]: 

(m) / r^ ^ 

E[G(-) (X; a, b, r)] = g^T^ (a, b, r) + a^J^hll 

yP 
(3-6) 

ff2 (a,b,r) 



+ 



P 



+ 



so that, using (3.4), the quantity of interest is given by 

g(a,b,c) = E[F(X;n)] 



(3.7) 



EE 

m=0 rePm 



■ K L 

nn 



■y 



Y^ g^^^{a + CA -rA,h + CB- r^, r) 
^2^ 



M=0 



->{'/n+u)/2 



We also have the boundary conditions 

(3.8) q{e„0,0)=7rf, g(0,e,-,0) = vrf , g(ei,e„0) = ^.^vrf , 
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£: 1 2 3 4 

," 1 ," 2 3 ," 4 ," 



9i' 







9i 



(c) 



92 



(c) 
93 



.(■=) 



J™) 



Fig. 1. Computation of gi, for each m = 0, . . . , c and u = 0, 1, When m > 0, t/ie 



Jm) 



tenn o„ residing on level £ — m + u is determined by up to four entries on level ■ 
This is illustrated for the example g^ 



(2) 



(3.9) 






■ T^n 



gy{ei,ej,Q) 










Vu>l, 
Vu>l, 
Vu> 1. 



where tt^ = (vr^)ig[x] and ir^ = (7r^)jg[L] are the stationary distributions 
of P and P^ , respectively. 

Using (3.7) and (3.8), we can also assign boundary conditions for each 

gi°)(a,b,0): 

fff (e,,0,0) = 
fff (0,e,-,0) = 

5f (ei,e,-,0) = 

We have reduced the problem of computing an asymptotic expansion 
for (/(a, b,c) to one of computing gr^ (a, b, r) for each m and u. Consider 
arranging these quantities in a c x N array, as illustrated in Figure 1. Refer to 
entries on the £th anti-diagonal, such that m-\-u = i, as residing on the ^th 
level. As is clear from (3.7), the contribution in the expansion for g(a, b,c) 
of order p~ '"^ is comprised of entries on level i. For convenience, we define 
5f„ (a, b,r) = if w < 0, or ?tt, < 0, or if any entry ai,bj,rij < 0. Then the 
following theorem, proved in Section 7.1, enables the level-wise computation 

of each gu ■ 



ANALYTIC TWO-LOCUS SAMPLING DISTRIBUTIONS 13 

Theorem 3.1. The term gu {a., h,r) in the right-hand side of (3.6) is 
determined as follows. 

(i) For m = and u = 0, gf\si,h,Q) = g^(a)g^(b). [Recall that g^(a) 
and q^{h) are the respective one-locus sampling distributions at locus A and 
locus B.] 

(ii) For m> and u>0, gu {a, h,r) on level £ is determined by at 

most four entries on level i— 2; these are gu , 5n-i ' 5^-2 '^'^^ 5m-3 ■ 
This relationship is given explicitly [equation (7.2)] in Section 7. 

(iii) For ?7i = and u>l, gu (a, b,0) on level £ is determined by gl^_i, 

also on level £, and other similar terms gu (a',b',0), where a' <a, b' < b. 
This relationship is given explicitly [equation (7.3)] in Section 7. 

For odd levels (i.e., £ = 1, 3, 5, . . .), the above theorem implies the following 
vanishing result, which we prove in Section 7.2. 

Corollary 3.1. If m + u = i is odd, then gu (a, b, r) = 0, for all con- 
figurations (a, b, r) . 

Incidentally, Corollary 3.1 implies that only integral powers of p have 
nonzero coefficients in (3.7). 

Given the entries on level i — 2, Theorem 3.1 provides a method for com- 
puting each of the entries on level i. They can be computed in any order, 
apart from the slight complication of (iii), which requires knowledge of guli 

as a prerequisite to gii . The expression for gu (a, b,0) is only known re- 
cursively in a and b, and we do not have a closed- form solution for this 
recursion for u > 4 and even. Equation (2.10) provides one example, which 

turns out to be the recursion for g^ (a, b, 0). If the marginal one-locus sam- 
pling distributions are known, the complexity of computing glj"* (a, b,r) for 
m > does not depend on the sample configuration (a, b, r). In contrast, the 

complexity of computing gu (a, b, 0) depends on a and b, and the running 
time generally grows with sample size. However, in Section 6 we show that 
ignoring gu (a, b,0) generally leads to little loss of accuracy in practice. 

To illustrate the method, entries on the first two even levels are summa- 
rized in Table 1. These recapitulate part of the results given in Theorem 2.1. 
The last column of Table 1 gives the "contribution" to (2.6) from gu (a, b, r) 
(for fixed u and m and summing over the relevant r). According to (3.7), 
this quantity is 

K L ^ 

glf^^ {a + CA - rA,h + Cb - rB,r) 



(3.10) V 



re-Pm 



nn 
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Table 1 
Entries on levels l — Q,2 of the array {Qu ) 



g„ (a, b,r) Contribution to (2.6) 



nU.X:,'U,K'] 9o(a,b,c) 

2 2 E[X..X,(fe-Afc.)fe-XOn„^:"n„X';"l, ?i(a,b,c) 

where r = Sij + Ski 
2 11 

2 2 



We have also checked that the total contribution from entries on level i = 4 
is equal to g2(a, b,c), as given in Theorem 2.1. We note in passing that 
Theorem 3.1 makes it transparent why q2{a,h,c) is not universal in the 
sense described in Section 2.3: expressions on level i = 4 depend directly 
on L^^' , which in turn depends upon the model of mutation. By contrast, 
the nonzero contribution to gi(a, b,c), for example, is determined by L^'^' , 
which does not depend on the model of mutation. 

It is important to emphasize that gu (a, b, r) is a function of the vec- 
tors a, b, the matrix r and (implicitly) the parameters 9a and 9b- The 
relationships given in Theorem 3.1 are thus functional, and only need to 
be computed once. In other words, all of these arguments of gu can re- 
main arbitrary. It is not necessary to redo any of the algebraic computations 
for each particular choice of sample configuration, for example. Moreover, 
the solutions to each gu {a, h,r) are expressed concisely in terms of the 
marginal one-locus sampling distributions q and q ; this fact follows in- 
ductively from the solution for ^'q (a, b,0). Unlike the method of Jenkins 
and Song (2009, 2010), the iterative procedure here is essentially the same 
at every step. 

4. Partial sums, optimal truncation and Pade approximants. In princi- 
ple, the procedure described in the previous section provides a method of 
computing an arbitrary number of terms in the asymptotic expansion (2.6), 
for any sample configuration. Suppose the computation has been carried out 
up to level i = 2M and consider the partial sum 

^A i\ (^^)/ u \ ^ u \ , 5i(a,b,c) gA/(a,b,c) 
(4.1 g^s (a, b, c = go a, b, c + + ■■■ + rj ■ 

Since we do not know its radius of convergence, we should be prepared for 
this sum to diverge eventually as M increases. An important question that 
we address in this section is: How many terms should we use to maximize 
the accuracy of the approximation? 
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4.1. Optimal truncation. As mentioned above, simply adding more and 
more terms to the asymptotic expansion may decrease the accuracy beyond 
a certain point. Optimal truncation is a rule of thumb for truncating the 
partial sum at a point that is expected to maximize its accuracy. More 
precisely, it is defined as follows. 

Definition 4.1 (Optimal truncation rule). Given the first M + 1 terms 
qo{a,h,c), qi{a,h,c), . . . ,qM{a,h,c), in the asymptotic expansion, let M' be 
the index such that |g'j\//(a,b,c)/p^^'| < |g'jv/"(a, b, c)//>^^"|, for all M" / M', 
where M', M" < M. Then, the optimal truncation rule (OTR) suggests trun- 
cating the sum at order M': 

•JotrI^' b, c) = qo{a, b, c) + + • • • + jp . 

The motivation for this rule is that the magnitude of the ith. term in the 
expansion is an estimate of the magnitude of the remainder. More sophisti- 
cated versions of the OTR are available [e.g.. Dingle (1973), Chapter XXI], 
but for simplicity we focus on the definition given above. 

There are two issues with the OTR. First, it minimizes the error only 
approximately, and so, despite its name, it is not guaranteed to be optimal. 
For example, the magnitude of the first few terms in the series can behave 
very irregularly before a pattern emerges. Second, it may use only the first 
few terms in the expansion and discard the rest. As we discuss later, for 
some sample configurations and parameter values of interest, the OTR might 
truncate very early, even as early as M' = 2. This is unrelated to the first 
issue, since the series may indeed begin to diverge very early. Below, we 
discuss a better approximation scheme with a provable convergence property. 

4.2. Fade approximants. The key idea behind Fade approximants is to 
approximate the function of interest by a rational function. In contrast to 
the OTR, Fade approximants make use of all of the computed terms in the 
expansion, even when the expansion diverges rapidly. More precisely, the 
[[//y] Fade approximant of a function is defined as follows. 

Definition 4.2 ([C//F] Fade approximant). Given a function / and 
two nonnegative integers U and V, the [t//y] Fade approximant of / is 
a rational function of the form 

such that -Bo = 1 and 

/(x)-[c//y];(x) = o(x^+^+i). 

That is, the first U + V + 1 terms in a Maclaurin series of the Fade approxi- 
mant [U/V]f{x) matches the first U+V+1 terms in a Maclaurin series of /. 
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Our goal is to approximate the sampling distribution g(a, b,c) by the 
Pade approximant 

(4.2) cz|f/,^l(a,b,c) = [C//F],(,,b,c)(^), 

such that the first U + V + 1 terms in a Maclaurin series of [C^/V^]g(a,b,c)(p) 
agrees with (4.1), where M = U + V. [In this notation, [C^/V^]g(a,b,c)(7) is 
an implicit function of the mutation parameters.] As more terms in (4.1) 
are computed (i.e., as M increases), a sequence of Pade approximants can 
be constructed. This sequence often has much better convergence properties 
than (4.1) itself [Baker and Graves- Morris (1996)]. 

For a given M, there is still some freedom over the choice of U and V. 
As M increases, we construct the following "staircase" sequence of Pade 
approximants: [0/0], [0/1], [1/1], [1/2], [2/2], .... This scheme is motivated by 
the following lemma, proved in Section 7.3. 

Lemma 4.1. Under a neutral, finite- alleles model, the sampling distribu- 
tion g(a, b,c) is a rational function ofl/p, and the degree of the numerator 
is equal to the degree of the denominator. 

This simple yet powerful observation immediately leads to a convergence 
result for the Pade approximants in the following manner. 

Theorem 4.1. Consider a neutral, finite-alleles model. For every given 
two-locus sample configuration (a, b,c), there exists a finite nonnegative 
integer Uq such that for all U >Uq and V > Uq, the Pade approximant 
(7pg^j^5 (a, b,c) is exactly equal to (/(a, b,c) for all p > 0. 

A proof of this theorem is provided in Section 7.4. Note that the staircase 
sequence is the "quickest" to reach the true sampling distribution g(a, b,c). 
Although Theorem 4.1 provides very strong information about the conver- 
gence of the Pade approximants, in practice U and V will have to be in- 
tractably large for such convergence to take place. The real value of Pade 
summation derives from the empirical observation that the approximants 
exhibit high accuracy even before they hit the true sampling distribution. 
The staircase sequence also has the advantage that it exhibits a continued 
fraction representation, which enables their construction to be made com- 
putationally more efficient [Baker and Graves-Morris (1996), Chapter 4]. 

5. Incorporating selection. We now incorporate a model of diploid se- 
lection into the results of Section 3. Suppose that a diploid individual is 
composed of two haplotypes {i,j), {k,l) G [K] x [L], and that, without loss of 
generality, selective differences exist at locus A. We denote the fitness of this 
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individual by 1 + s^ , and consider the diffusion limit in which a^ = 4A^s^ 
is held fixed for each i,k & [K] as N ^ oo. 

In what follows, we use a subscript "s" to denote selectively nonneutral 
versions of the quantities defined above. Results will be given in terms of the 
nonneutral one-locus sampling distribution q^ at locus A and the neutral 
one-locus sampling distribution q^ at locus B. 

5.1. One-locus sampling distribution under selection. For the infinite- 
alleles model, one-locus sampling distributions under symmetric selection 
have been studied by Grote and Speed (2002), Handa (2005) and Huillet 
(2007). In the case of a parent-independent finite-alleles model, the station- 
ary distribution of the one-locus selection model is known to be a weighted 
Dirichlet distribution [Wright (1949)], 

\i=l I \ i=\ fc=l / 

where x G Ax [see (2.1)] and D is a normalizing constant. The one-locus 
sampling distribution at stationarity is then obtained by drawing a multi- 
nomial sample from this distribution: 

K 



(5.1) 9,''(a) = E, 



Thus, under a diploid selection model with parent-independent mutation, 
we are able to express the one-locus sampling distribution at least in inte- 
gral form. There are two integrals that need to be evaluated: one for the 
expectation and the other for the normalizing constant. In practice, these 
integrals must be evaluated using numerical methods [Buzbas, Joyce and 
Abdo (2009)]. 

5.2. Two-locus sampling distribution with one locus under selection. To 
incorporate selection into our framework, we first introduce some further 
notation. 

Definition 5.1. Given two- locus population-wide allele frequencies x G 
^KxL [see (2.3)], the mean fitness of the population at locus A is 



o-^(xA) = X]^ifc^»- 



Xk- 



Selection has an additive eff'ect on the generator of the process, which is 
now given by 

d 



■^« = "^ + 9 I^ ^*J Yl ''^k^k- - O^ (XA 



2 



OXij 
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where J£ is the generator (3.1) of the neutral diffusion process [see, e.g., 
Ethier and Nagylaki (1989)]. Rewriting ^ in terms of the LD variables and 
then rescaling dij as before, we obtain 



^s = ^+^ 



pL(^) + Vpi^«+Lf + -LLM) 



where ^ is as in (3.3), and the new contributions are 
L(^)=0, 
L«=0, 

i,j L \ A: ^ k,k' 



+E^hE< 



'ik^k- 



^^(xa) 



d 

dxi. ' 






ik^k- 



d 

dx.. 



In addition, we replace the asymptotic expansion (3.6) with 

E, [G(-) (X; a, b, r)] = 4") (a, b, r) + ^^^^^^ 



/12 (£^ib,r) 



+ 



ddij 



the corresponding expansion for the expectation of G^™'(X;a, b,r) with 
respect to the stationary distribution under selection at locus A. Finally, 
the boundary conditions (3.9) become the following [Fearnhead (2003)]: 

4°^(e„0,0) = (^f, /ilo)(e„0,0) = Vn>l, 

(5.2) /if (0,e,-,0) = 7rf, /iW(0,e,-,0) = Vu > 1, 

4°^(e,,e,-,O) = 0f7rf, /iW(e„e,-,0) = Vn>l, 

where (p = {(pf)^^^^] is the stationary distribution for drawing a sample of 
size one from a single selected locus. 

With only minor modifications to the arguments of Section 3, each term 
in the array for hu can be computed in a manner similar to Theorem 3.1. 
In particular, entries on odd levels are still zero. Furthermore, as proved in 
Section 7.5, we can update Theorem 2.1 as follows. 
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Theorem 5.1. Suppose locus A is under selection, while locus B is se- 
lectively neutral. Then, in the asymptotic expansion (2.6) of the two-locus 
sampling distribution, the zeroth- and first- order terms are given by (2.7) 
and (2.8), respectively, withq^{a.) in place ofq^{si). Furthermore, the second- 
order term (2.9) may now be decomposed into two parts: 

(5.3) g2,s(a, b, c) = g2,s(a + c^, b + c^, 0) + o-s(a, b, c), 

where (Ts(a, b, c) is given by a known analytic expression and (?2,s(a, b, 0) sat- 
isfies a slightly modified version of the recursion relation (2.10) for g2(a, b, 0). 
(These expressions are omitted for brevity.) 

We remark that the above arguments can be modified to allow for locus B 
also to be under selection, provided the selection is independent, with no 
epistatic interactions, and provided one can substitute (()j for vr^ in (5.2). 

Then, one could also simply substitute g<f (b) for q^{h) in the expressions 
for go(a, b,c) and gi(a, b,c). However, for the boundary conditions (5.2) to 
be modified in this way we would need to extend the result of Fearnhead 
(2003) to deal with two nonneutral loci, and we are unaware of such a result 
in the literature. 

6. Empirical study of accuracy. Li this section, we study empirically the 
accuracy of the approximate sampling distributions discussed in Section 4. 

6.1. Computational details. As discussed earlier, a major advantage of 
our technique is that, given the first M terms in the asymptotic expan- 
sion (2.6), the (M + l)th term can be found and has to be computed only 
once. There are two complications to this statement: first, as mentioned in 
the discussion following Theorem 3.1, the Mth-order term qm for M > 2 

has a contribution [namely, 52m(^'^i^)] ^^^^ ^^ ^^^ known in closed form, 
and is only given recursively. [Recall that the M = 1 case is an exception, 

with ^2 (a, b,0) = for all (a, b,0).] In Jenkins and Song (2009) it was 

observed that the contribution of g\ (a, b,0) to (/2(a, b, c) is generally very 
small, but that its burden in computational time increases with sample size. 
Extrapolating this observation to higher-order terms, we consider making 
the following approximation. 

Approximation 6.1. For ah M > 2, assume 

ffg)f(a,b,O)^0 
for all configurations (a, b,0). 

As we show presently, adopting this approximation has little effect on 
the accuracy of asymptotic sampling distributions. In what follows, we use 
the symbol "°" to indicate when the above approximation is employed. For 
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example, the partial sum qpg (a, b,c) in (4.1) becomes gpg (a, b,c) under 
Approximation 6.1. 

Upon making the above approximation, it is then possible to construct a 
closed- form expression for each subsequent term (7Af(a,b,c). However, there 
is a second issue: Given the effort required to reach the complicated expres- 
sion for q2{sL,h,c) [Jenkins and Song (2009)], performing the computation 
by hand for M > 2 does not seem tractable. Symbolic computation using 
computer software such as Mathematica is a feasible option, but we de- 
fer this for future work. Here, we are interested in comparing the accuracy 
of asymptotic sampling distributions with the true likelihood. Therefore, we 
have implemented an automated numerical computation of each subsequent 
term in the asymptotic expansion, for a given fixed sample configuration and 
fixed mutation parameters. For the samples investigated below, this did not 
impose undue computational burden, even when repeating this procedure 
across all samples of a given size. Exact numerical computation of the true 
likelihood is possible for only small sample sizes (say, up to thirty), so we 
restrict our study to those cases. 

For simplicity, we assume in our empirical study that all alleles are se- 
lectively neutral. Furthermore, we assume a symmetric, PIM model so that 

P"^ = P-^ = f A^ A^ j and take 6a = 6b = 0.01. This is a reasonable ap- 
proximation for modeling neutral single nucleotide polymorphism (SNP) 
data in humans [e.g., McVean, Awadalla and Fearnhead (2002)]. For the 
PIM model, recall that the marginal one-locus sampling distributions are 
available in closed-form, as shown in (2.5). 

6.2. Rate of convergence: An example. To compare the convergence of 
the sequence of partial sums (4.1) with that of the sequence of Pade approx- 
imants (4.2), we re-examine in detail an example studied previously. The 
sample configuration for this example isa = b = 0, c=(2 i)-Iii Jenk- 
ins and Song (2009), we were able to compute the first three terms in the 
asymptotic expansion, obtaining the partial sum g'pg(a,b,c). Applying the 
new method described in this paper, we computed Qpg (a, b, c) for M < 11 
[including the recursive terms 5^ (a, b,0) discussed above], and also the 

corresponding staircase Pade approximants gp^^^^ (a, b,c). Results are il- 
lustrated in Figure 2, in which we compare various approximations of the 
likelihood curve for p with the true likelihood curve. Here, the likelihood of 
a sample is defined simply as its sampling distribution g(a,b,c) treated as 
a function of p, with 9a and 9b fixed at 0.01. 

Figure 2(a) exhibits a number of features which we also observed more 
generally for many other samples. For fixed p in the range illustrated, the se- 
quence (^ps , 9ps , ^ps ) • • •) of partial sums diverges eventually, and, for many 
realistic choices of p, this divergence can be surprisingly early. Arguably 
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Fig. 2. Likelihood curves for p, comparing different levels of truncation, for the sample 
a = b = 0, €=(2" j)-^ symmetric PIM model with 6A = dB = 0.01 is assumed. The true 
likelihood is shown as a thick black line. Approximate likelihood curves for various levels 
of truncation, M — 0,1,. . . , 11 are shown as thinner gray lines, (a) Partial sums gpg , la- 
beled by M. The bottom row of indices records M' , the level of truncation in gQrpJ^(a, b,c) 
recommended by the OTR. The row above records the actual level of truncation that min- 
imizes the unsigned relative error, (b) Staircase Pade approximants q , , some labeled 

by \U /V\. To the naked eye, gL (- for most \U /V] are indistinguishable from the true 
likelihood curve. 



(2) 

the best overall fit in the figure is gpg , though for p > 70 good accuracy 

is maintained by gpg ' for all 1 < M < 10. Divergence is extremely rapid if 

we add any further terms; witness the curve for q^^ . Of course, in real ap- 
plications the true likelihood will be unavailable and we might rely on the 
aforementioned optimal truncation rule to guide the truncation point. Here, 
it performs reasonably well, correctly picking the most accurate index across 
most of the range [compare the bottom two rows of indices in Figure 2(a)]. 

In contrast. Figure 2(b) shows that there is much less risk in using "too 
many" terms in constructing the Pade approximants. They approximate 
the true likelihood very accurately and very quickly; to the naked eye, Pade 
approximants ^pj^^j,; for most [C//l^] are indistinguishable from the true like- 
lihood curve. Indeed, this example suggests that there is very little gain in 
accuracy beyond Qp^<j^, but that there is no significant loss beyond it either. 
(It should be pointed out, however, that achievement of such high accuracy 
so early in the sequence of approximants does not seem to be a common oc- 
currence across samples. Usually, several more terms are required; see next 
section for further details.) 

There is one further important observation to be made regarding Pade 
approximants. There is nothing to prevent the polynomial in the denomina- 
tor from having positive real roots, thus creating singularities. This is indeed 
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Table 2 
Nonnegative, real roots in the numerator and denominator of the [U/V] Fade 

10 7^ 
2 1, 



approximants for the sample a = b = 0,c^''^ "' 



[U/V] 


Roots of numerator 


Roots of denominator 


[0/0] 










[0/1] 


0.000 








[1/1] 


4.871 








[1/2] 


0.000 








[2/2] 










[2/3] 


0.000 




0.912 




[3/3] 










[3/4] 


0.000 








[4/4] 










[4/5] 


0.000 








[5/5] 










[5/6] 


0.000 


8,474,538.140 


8,474,538.140 




[6/6] 










[6/7] 


0.000 


306.846 


306.846 




[7/7] 










[7/8] 


0.000 








[8/8] 


82.033 




82.032 




[8/9] 


0.000 


77.366 


0.284 


77.364 


[9/9] 


82.121 


4,412.751 


82.120 


4,412.751 


[9/10] 


0.000 








[10/10] 


5.543 




4.252 





[2/31 

observed once in Figure 2(b); ^p^^ji exhibits a singularity at p = 0.9. To ex- 
amine this behavior further, in Table 2 we tabulate the nonnegative real 
roots of the numerator and denominator of each approximant in the stair- 
case sequence up to [10/10]. We see some interesting patterns: (i) The total 
number of nonnegative real roots does not seem to grow with M. (ii) Roots 
in the denominator are almost invariably accompanied by a nearby root in 
the numerator, and their proximity is usually extremely close, with agree- 
ment to several decimal places, (iii) Pairs of roots appear transiently; the 
roots of one approximant in the sequence provide almost no information 
regarding the next. Such pairs of roots are known as defects, and are an 
inevitable feature of Pade approximants [Baker and Graves- Ivlorris (1996), 
page 48]. Provided we anticipate them, defects need not act as a serious 
obstacle. Because of the proximity of the two roots in a pair, they almost 
behave like removable singularities. In particular, Pade approximants are 
still well behaved outside a reasonably narrow interval around the defect, 
and can still approximate the likelihood accurately. In light of these obser- 
vations, henceforth we use the following simple heuristic for dealing with 
defects. 
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Definition 6.1 (Defect heuristic). Suppose we are interested in approx- 
imating the hkehhood curve at a particular value po and have the resources 
to compute up to M in the partial sum (4.1). Then proceed as follows. 

(1) Initialize with M" = M. 

(2) Construct the [f//l^] Pade approximant in the staircase sequence, 
where U + V = M". 

(3) If it exhibits a root in the interval {po — e, po + e)ri[0,oo), either in its 
numerator or denominator, then decrement M" by one and go to step (2), 
otherwise use this approximant. 

Choice of threshold e involves a trade-off between the disruption caused 
by the nearby defect and the loss incurred by reverting to the previous Pade 
approximant. Throughout the following section, we employ this heuristic 
with e = 25, which seemed to work well. 

6.3. Rate of convergence: Empirical study. To investigate to what extent 
the observations of Section 6.2 hold across all samples, we performed the 
following empirical study. Following Jenkins and Song (2009), we focused 
on samples of the form (0, 0, c) for which all alleles are observed at both 
loci, and measured the accuracy of the partial sum (4.1) by the unsigned 
relative error 

gg)(0,0,c)-g(0,0,c) 



et?^(0,0,c) 



g(0,0,c) 



X 100%. 



An analogous definition can be made for ep^J^^, the unsigned relative error of 

the staircase Pade approximants ^'p^^^j^ , where U = \_M/2\ and V = \M/2\ . 
When Approximation 6.1 is additionally used, the respective unsigned errors 

are denoted epg and Cp^^^ . These quantities are implicit functions of the 
parameters and of the sample configuration. In our study, we focused on 
sufficiently small sample sizes so that the true sampling distribution g(a, b, c) 
could be computed. Specifically, we computed g(0, 0, c) for all samples of 
a given size c using the method described in Jenkins and Song (2009). By 
weighting samples according to their sampling probability, we may compute 

the distributions of epg and ep,^l^ , and similarly the distributions of epg 
and &p^^^. Table 3 summarizes the cumulative distributions of Cpg and 
^Padc ^°^ P ~ ^'-'' across all samples of size c = 20 that are dimorphic at both 
loci. The corresponding table for epg and ep^^^j^ was essentially identical 
(not shown), with agreement usually to two decimal places. This confirms 
that utilizing Approximation 6.1 is justified. 

Table 3 illustrates that the observations of Section 6.2 hold much more 
generally, as described below. 
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Cumulative distribution $(a;) 



Table 3 
^^(A/) ^ ^0^^ (where eS'^^' denotes either Cpg or e .) 

of the unsigned relative error of the partial sum Qpg and the corresponding Fade 
approximants, for all samples of size 20 dimorphic at both loci. Here, p — 50, and 

Approximation 6. 1 is used 



M 


Type of sum 


*(1) 


*(5) 


*(10) 


*(25) 


*(50) 


*(100) 





PS 


0.49+ 


0.56+ 


0.63 


0.81 


0.98 


1.00 




Pade 


0.49 


0.56 


0.63 


0.81 


0.98 


1.00 


1 


PS 


0.51 


0.74 


0.87 


0.99 


1.00 


1.00 




Pade 


0.59 


0.77 


0.84 


0.91 


0.98 


0.99 


2 


PS 


0.59 


0.87 


0.92 


1.00 


1.00 


1.00 




Pade 


0.77 


0.97 


0.98 


0.99 


1.00 


1.00 


3 


PS 


0.57 


0.86 


0.97 


1.00 


1.00 


1.00 




Pade 


0.91 


0.96 


0.98 


0.99 


1.00 


1.00 


4 


PS 


0.45 


0.62 


0.87 


0.98 


0.98 


1.00 




Pade 


0.95 


0.99 


1.00 


1.00 


1.00 


1.00 


5 


PS 


0.30 


0.50 


0.61 


0.76 


0.79 


0.97 




Pade 


0.98 


1.00 


1.00 


1.00 


1.00 


1.00 


10 


PS 


0.00 


0.02 


0.03 


0.07 


0.08 


0.11 




Pade 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 




OTR 


0.25 


0.36 


0.65 


0.89 


0.90 


1.00 



TThese two values were misquoted in the text of Jenkins and Song [(2009), page 1093]; 
this table corrects them. 



(1) The error Cpg for partial sums is not a monotonically decreasing 

function of M; that is, the accuracy of gpg improves as one adds more 
terms up to a certain point, before quickly becoming very inaccurate. 

(2) Empirically, the actual optimal truncation point for the parameter 
settings we considered is at M' = 2 or M' = 3, which perform comparably. 
Moreover, both provide consistently higher accuracy than employing the 
OTR, which is a serious issue when we wish to use this rule without external 
information about which truncation point really is the most accurate. 

(3) Overall, using Pade approximants is much more reliable. Note that 
the accuracy of Pade approximants continues to improve as we incorporate 
more terms. For a sample drawn at random, the probability that its Pade 
approximant is within 1% of the true sampling distribution is 1.00, compared 
to 0.59 or 0.57 for truncating the partial sums, respectively, at M' = 2 or 
M' = 3, and only 0.25 for using the OTR. 

For the remainder of this section, we focus on the accuracy of the stair- 
case Pade approximants. It was shown in Jenkins and Song (2009) that the 
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Table 4 

Effect of p on the cumulative distribution ^{x) = F{e K < x%) of the unsigned relative 

error of the Pade approximants, for all samples of size 20 dimorphic at both loci 



M 


P 


*(1) 


*(5) 


*(10) 


*(25) 


*(50) 


*(100) 





25 


0.39 


0.52 


0.58 


0.69 


0.84 


1.00 




50 


0.49 


0.56 


0.63 


0.81 


0.98 


1.00 




100 


0.50 


0.61 


0.72 


0.97 


1.00 


1.00 




200 


0.54 


0.72 


0.95 


0.99 


1.00 


1.00 


1 


25 


0.51 


0.62 


0.75 


0.85 


0.94 


0.96 




50 


0.59 


0.77 


0.84 


0.91 


0.98 


0.99 




100 


0.74 


0.91 


0.95 


0.98 


0.99 


1.00 




200 


0.90 


0.98 


0.99 


1.00 


1.00 


1.00 


2 


25 


0.59 


0.82 


0.91 


0.94 


0.95 


0.97 




50 


0.77 


0.97 


0.98 


0.99 


1.00 


1.00 




100 


0.95 


1.00 


1.00 


1.00 


1.00 


1.00 




200 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


3 


25 


0.64 


0.91 


0.95 


0.96 


0.98 


1.00 




50 


0.91 


0.96 


0.98 


0.99 


1.00 


1.00 




100 


0.99 


1.00 


1.00 


1.00 


1.00 


1.00 




200 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


4 


25 


0.83 


0.96 


0.99 


0.99 


1.00 


1.00 




50 


0.95 


0.99 


1.00 


1.00 


1.00 


1.00 




100 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 




200 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


5 


25 


0.82 


0.94 


0.98 


0.99 


1.00 


1.00 




50 


0.98 


1.00 


1.00 


1.00 


1.00 


1.00 




100 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 




200 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 


10 


25 


0.97 


0.99 


0.99 


1.00 


1.00 


1.00 




50 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 




100 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 




200 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 



(2) 

accuracy of the partial sum gpg increases with p, but (perhaps surprisingly) 
decreases with increasing sample size. In Tables 4 and 5, we address the 
same issue for the Pade approximant. Table 4 confirms that accuracy in- 
creases with p, as one might expect. Furthermore, it is also the case that 
substantial accuracy is achievable even for moderate values of p (say, p = 25), 
provided that sufficiently many terms are utilized in the construction of the 
Pade approximant. For example, when M = 5 the probability that the Pade 
approximant of a sample drawn at random is within 5% of the truth is 
0.94. Also very encouraging is the pattern shown in Table 5. Provided that 
sufficiently many terms are used, the accuracy of the Pade approximant is 
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Table 5 

Effect of sample size c on the cumulative distribution ${x) =¥{e , < x%) of the 

unsigned relative error of the Pade approximants, for all samples of size c dimorphic 

at both loci. Here p = 50 



M 


c 


*(1) 


*(5) 


*(10) 


*(25) 


*(50) 


$(100) 





10 


0.58 


0.67 


0.72 


0.96 


1.00 


1.00 




20 


0.49 


0.56 


0.63 


0.81 


0.98 


1.00 




30 


0.44 


0.50 


0.58 


0.72 


0.91 


1.00 


1 


10 


0.72 


0.90 


0.94 


0.97 


1.00 


1.00 




20 


0.59 


0.77 


0.84 


0.91 


0.98 


0.99 




30 


0.53 


0.69 


0.76 


0.85 


0.95 


0.97 


2 


10 


0.94 


1.00 


1.00 


1.00 


1.00 


1.00 




20 


0.77 


0.97 


0.98 


0.99 


1.00 


1.00 




30 


0.62 


0.89 


0.95 


0.98 


0.99 


0.99 


3 


10 


0.98 


1.00 


1.00 


1.00 


1.00 


1.00 




20 


0.91 


0.96 


0.98 


0.99 


1.00 


1.00 




30 


0.81 


0.92 


0.94 


0.98 


0.99 


1.00 


4 


10 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 




20 


0.95 


0.99 


1.00 


1.00 


1.00 


1.00 




30 


0.90 


0.99 


0.99 


1.00 


1.00 


1.00 


5 


10 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 




20 


0.98 


1.00 


1.00 


1.00 


1.00 


1.00 




30 


0.86 


0.99 


0.99 


1.00 


1.00 


1.00 


10 


10 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 




20 


1.00 


1.00 


1.00 


1.00 


1.00 


1.00 




30 


0.99 


1.00 


1.00 


1.00 


1.00 


1.00 



only slightly affected by looking at larger sample sizes. For example, when 
M = 5 the probability that the Fade approximant of a randomly drawn 
sample of size 30 is within 5% of the truth is 0.99, compared with 1.00 
for a sample of size 20. This loss in accuracy is much less severe than the 
corresponding loss in accuracy of the partial sums; for gpg , the highest ac- 
curacy is achieved for M = 2, in which case which the corresponding loss in 
accuracy is from 0.87 when c = 20, to 0.70 when c = 30. 

7. Proofs. In this section, we provide proofs of the results presented 
earlier. 

7.1. Proof of Theorem 3.1. For an infinitesimal generator ^2/ of a dif- 
fusion process on the state space O and a twice continuously differentiable 
function /i : $7 — )■ M with compact support, it is well known that 

E[^/i(X)]=0, 
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where expectation is with respect to the stationary distribution of X. Apply 
this result to the generator ^ shown in (3.3) and monomial G^'"^(x;a, b,r) 
shown in (3.5). This provides a linear equation relating the expectations 
E[G(™+i)(X;a', b',r')], E[G('")(X;a',b',r')], E[G(™-i)(X;a',b',r')] and 
E[G(™'~^)(X;a',b', r')], each appearing with various different arguments 
(a', b', r') depending on (a, b, r); we omit the simple but algebraically lengthy 
details. Now, for these four choices of m we substitute the proposed expan- 
sion (3.6). If m > 0, then compare the coefficients of p^~"'^ in the resulting 
expression; if tti = 0, then compare the coefficients of p~"' ^. We then obtain 
the following: 



(i) If ?n, = and u = 0, then the resulting expression is 

\a{a - 1 + ^a) + ?>(6 - 1 + QB)\9f (a, b, 0) 

= J^ a, (a, - 1)5^°) (a - e„ b, 0) + ^ 6,- (6,- - l)gj°) (a, b - e„ 0) 



+ 0A J^ a^Pt9f (a - ei + e^, b, 0) 



(7.1) 

+ 0B5]6,P;f<?J°)(a,b-e,+e;,O) 

with boundary conditions given by (3.9). This is the sum of two copies of 
a familiar recursion [Griffiths and Tavare (1994)] for the sampling distribu- 
tion of a single locus, one for locus A and one for locus B. In our notation 
the solution is, therefore, 

5r(a,b,0) = g^(a)g^(b). 
(ii) If ?n, > 0, then the resulting expression is 
m5(™)(a,b,r) 

Y^ \ nj {rij - 1)9^'^^ (a + ei , b + ej , r - 2e,j ) 

~ X] ''*J ^'^^'- ~ ^Ji^air'"^^ {ai + ei,h + ej + ei,r- eij - eu ) 
I 

- X] '^iji'^kj - Sik)9lr~^^ {a. + ei + efc, b + e^, r - e^ - e^j) 

k 

+ X rij{rki - 6ik5ji)gl]^~'^^ {a. + ei + e^, b + ej + ei,r- etj - e^z) 

k,l 
+ i^ijirij - l)9u^i (a, b, r - e.^) 



«j 
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- 2rij {n. - l)9l^i (a, b + ej , r - ejj) 

- 2rij {r.j - l)gi™7^^ (a + ej, b, r - Bij) 

+ 2 ^ rkjiui - hk5ji)gu^i (a + e^, b + e^, r - e^j - en + eij) 

k,l 

+ 2{m- l)rijg^^'l^\a. + ei,h + ej,Y-eij) \ 
(7.2) + Y,<'^i + 2ri. - l)5S(a- e„b,r) 

i 

+ 5^6,(6, +2r.,-l)5S(a,b-e,,r) 
i 

- 2 ^ ^ aiVkjg^Jllia - e^ + e^, b, r - e^j + e^) 

i,j k 

- 2 ^ ^ bjrug^^l (a, b - e^ + e,, r - e^, + e^j ) 



i,j I 



i,k 



+ X] ''ii^i-2(^' b, r - ejj + e 



fcj; 



+ ^bE^' 



hi 



^i^i 2(a,b-ej +ei,r) 



+ X] '^«i 5^1-2 (a, b, r - eij + e^i) 



-[(a + m)(a + m + 6'A-l) 

+ {h + m){h + m + 6B-l) - m(m - 3)]5r|j_2(a'b,r) 
+ 2 E aj6j5'|j'^3 ^^ (a - ej, b - ej , r + ejj). 

Equation (7.2) relates gu (a, b, r) to the known expressions gu , gu-i ' 

gy^2 ^^'^ 9m-3 ' ^^ claimed. 

(iii) If 771 = and n > 1, then the resulting expression is 

[a{a + 0A - 1) + fe(& + 0a- l)]5i°^ (a, b, 0) 

= E ai (ai - 1)9^ (a - e„ b, 0) + E 6,- (6,- - l)g W (a, b - e„ 0) 
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(7.3) +^Aj^aiPfc1fff (a-ei + efc,b,0) 






with boundary conditions (3.9). Hence, this provides a recursion relation for 
Qu (a, b, 0) when g\^_i is known. 

7.2. Proof of Corollary 3.1. For the base case ^ = 1, note that if m = 1 
and u = then (7.2) simpHfies to 

5S'^(a,b,r)=0, 
and hence, if to = and u = l then (7.3) siniphfies to 

[a{a + 9a-1) + b{b + ^a - l)]9?^ (a, b, 0) 

(7.4) 

+ 0A 5^ aiP^^gf"^ (a - e, + e^, b, 0) 

i,k 

+ eBY^ bjP/^gf^ (a, b - e, + e^, 0) 

with boundary conditions (3.9). This has solution 

<7f^(a,b,0) = 0, 

which is unique [since g(a, b,c) is unique]. This completes i = l. Now sup- 
pose inductively that gu (a, b, r) = for every m, u such that m + u = 1 — 2 
where ^ is a fixed odd number greater than 1. Then for m, u such that 
m + u = I., (7.2) becomes 

5^)(a,b,r)=0 
as required. 

7.3. Proof of Lemma 4-i- In what follows, define the length of a sam- 
ple configuration (a, b, c) to be a + b + 2c. Under a neutral, finite-alleles 
model, the probability of a sample with length 6 satisfies a closed system of 
equations [e.g., see equation (5) of Jenkins and Song (2009)] which can be 
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expressed in matrix form: 

Mq = v, 

where q is a vector composed of the probabihties of samples of length less 
than or equal to 6, v is a constant vector of the same dimension as q and M 
is an invertible matrix (since the solution to this equation is unique). The 
entries of M and v are rational functions of p, and hence, q = M^^v is 
a vector each of whose entries is a rational function of p. 

Let Uq denote the degree of the numerator, and Vq the degree of the de- 
nominator. If Uq > Vq, then g(a, b, c) becomes unbounded as p — )• oo, while if 
Vo > Uq then q{a, b, c) — )• as p — )■ oo. But we know that q'(a, b, c) is a prob- 
ability, and hence, bounded. Moreover, it has support over all samples of 
a fixed size, since we assume that P and P-^ are irreducible. Thus, to en- 
sure limp_5.oo ?(&) b, c) G (0, 1) we must have Uo = Vo. By a similar argument 
as /9 — 7- 0, we must have that the coefficients of p^ are nonzero both in the 
numerator and in the denominator. We can, therefore, divide the numerator 
and denominator by p^° to obtain a rational function of 1/p whose degree 
in the numerator and denominator are both Uq (=Vq). 

7.4. Proof of Theorem 4-1- This is an application of Theorem 1.4.4 of 
Baker and Graves-Morris (1996), which we spell out for completeness. By 
Lemma 4.1, q{a,h,c) is a rational function of 1/p and is analytic at /9 = oo 
with Taylor series (2.6). Denote the degree of its numerator and denominator 
by Uq. Then, q{si, b, c) has Uq + Uq + I independent coefficients determined 
by the first Uq + Uq + I terms of its Taylor series expansion. Thus, provided 
U >Uo and V >Uq, by the definition of the [f//V^] Fade approximant, it 
must coincide uniquely with q{a,h,c). 

7.5. Proof of Theorem 5.1. This is simply an application of Theorem 3.1 
applied to the generator for the diffusion under selection, ^g, rather than ^, 
and so we just summarize the procedure. 

The change in generator results in slight modifications to the relationships 
between the (7|j™ (a, b,r) in order to obtain the relationships between the 
/lu (a, b,r) for each m and u: 

(1) Hq {aL,h,0) satisfies (7.1) (replacing each g^ with h^ ), but with 
extra terms 



i,k 



+ ^ aj CFikh^Q^ (a + efc, b, 0) - ^ CF^k'hf^ (» + ^k + efc', b, 0) 



on the right-hand side. The solution is 



hf{a,h,Q)=qt{a)q^{h). 
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(2) For m > 0, /iq (a, b,0) satisfies (7.2) but with extra terms 



+ ^(oi + ri.)afkh^^\{a + e/,, b, r) 

i,k 

(7.5) -(a + m)^cj4'/il"!2(a + efc + efc',b,r) 



- X] ''u^fcfc'^i-2(a + ei + efc,b,r-eij+efc/j) 

i,j,k,k' 

on the right-hand side. 

(3) For m = and u > 1, /iq (a, b, 0) satisfies (7.2) but with exi 

+ V[a,af,/ilo) (a + e^, b, 0) - aaf^h^^^ (a + e, + e^, b, 0)] 



i.k 



+ Yl ^i^ifc^l-i(^ + efc, b - ej, eij) 

on the right-hand side. 

Using these equations to evaluate hu (a, b, r) on levels £ = 0, 1, . . . , 4 pro- 
vides expressions for the nonneutral versions of go(a, b,c), gi(a, b,c) and 
q2{a,h,c). Those for go(a, b,c) and gi(a,b,c) in terms of the relevant one- 
locus sampling distributions are unchanged, while the new generator makes 
some minor modifications to the expression for q2{a, b, c). The analytic part 
of this term, (Ts(a,b,c), is easily calculated from /ig , h^ , /12 and /13 , 
while the recursive part, q2.s{si. + CA,h -\-cb,0), follows from h^ . 
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